inappropriate image
Universal Prompt Optimizer for Safe Text-to-Image Generation
Wu, Zongyu, Gao, Hongcheng, Wang, Yueze, Zhang, Xiang, Wang, Suhang
Text-to-Image (T2I) models have shown great performance in generating images based on textual prompts. However, these models are vulnerable to unsafe input to generate unsafe content like sexual, harassment and illegal-activity images. Existing studies based on image checker, model fine-tuning and embedding blocking are impractical in real-world applications. Hence, we propose the first universal prompt optimizer for safe T2I (POSI) generation in black-box scenario. We first construct a dataset consisting of toxic-clean prompt pairs by GPT-3.5 Turbo. To guide the optimizer to have the ability of converting toxic prompt to clean prompt while preserving semantic information, we design a novel reward function measuring toxicity and text alignment of generated images and train the optimizer through Proximal Policy Optimization. Experiments show that our approach can effectively reduce the likelihood of various T2I models in generating inappropriate images, with no significant impact on text alignment. It is also flexible to be combined with methods to achieve better performance. Our code is available at https://github.com/wzongyu/POSI.
- Europe > Finland (0.04)
- North America > United States > Pennsylvania (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
- Law > Criminal Law (0.55)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.35)
AI-generated child pornography is circulating. This California prosecutor wants to make it illegal.
After several reports of artificial intelligence-generated child pornography surfaced in California, Ventura County Dist. Erik Nasarenko advocated for a change to state law to protect children who are increasingly vulnerable to this misuse of technology. Last December, Nasarenko received his first tip regarding a person who had artificially created photos depicting an underaged girl performing sex acts with an adult man. "When it came to my attention, I said let's file [charges]," Nasarenko told The Times. But, because of current loopholes in California law, he learned that he couldn't press charges in cases where the photos of children are AI-generated.
- North America > United States > California > Ventura County > Ventura (0.26)
- North America > United States > California > Los Angeles County > Los Angeles (0.06)
- North America > United States > California > Los Angeles County > Beverly Hills (0.06)
Ring-A-Bell! How Reliable are Concept Removal Methods for Diffusion Models?
Tsai, Yu-Lin, Hsu, Chia-Yi, Xie, Chulin, Lin, Chih-Hsun, Chen, Jia-You, Li, Bo, Chen, Pin-Yu, Yu, Chia-Mu, Huang, Chun-Ying
Diffusion models for text-to-image (T2I) synthesis, such as Stable Diffusion (SD), have recently demonstrated exceptional capabilities for generating high-quality content. However, this progress has raised several concerns of potential misuse, particularly in creating copyrighted, prohibited, and restricted content, or NSFW (not safe for work) images. While efforts have been made to mitigate such problems, either by implementing a safety filter at the evaluation stage or by fine-tuning models to eliminate undesirable concepts or styles, the effectiveness of these safety measures in dealing with a wide range of prompts remains largely unexplored. In this work, we aim to investigate these safety mechanisms by proposing one novel concept retrieval algorithm for evaluation. We introduce Ring-A-Bell, a model-agnostic red-teaming tool for T2I diffusion models, where the whole evaluation can be prepared in advance without prior knowledge of the target model. Specifically, Ring-A-Bell first performs concept extraction to obtain holistic representations for sensitive and inappropriate concepts. Subsequently, by leveraging the extracted concept, Ring-A-Bell automatically identifies problematic prompts for diffusion models with the corresponding generation of inappropriate content, allowing the user to assess the reliability of deployed safety mechanisms. Finally, we empirically validate our method by testing online services such as Midjourney and various methods of concept removal. Our results show that Ring-A-Bell, by manipulating safe prompting benchmarks, can transform prompts that were originally regarded as safe to evade existing safety mechanisms, thus revealing the defects of the so-called safety mechanisms which could practically lead to the generation of harmful contents.
- Research Report > New Finding (0.68)
- Research Report > Promising Solution (0.66)
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models
Schramowski, Patrick, Brack, Manuel, Deiseroth, Björn, Kersting, Kristian
Text-conditioned image generation models have recently achieved astonishing results in image quality and text alignment and are consequently employed in a fast-growing number of applications. Since they are highly data-driven, relying on billion-sized datasets randomly scraped from the internet, they also suffer, as we demonstrate, from degenerated and biased human behavior. In turn, they may even reinforce such biases. To help combat these undesired side effects, we present safe latent diffusion (SLD). Specifically, to measure the inappropriate degeneration due to unfiltered and imbalanced training sets, we establish a novel image generation test bed-inappropriate image prompts (I2P)-containing dedicated, real-world image-to-text prompts covering concepts such as nudity and violence. As our exhaustive empirical evaluation demonstrates, the introduced SLD removes and suppresses inappropriate image parts during the diffusion process, with no additional training required and no adverse effect on overall image quality or text alignment.
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- Asia > Japan (0.04)
- North America > United States > New York (0.04)
- (3 more...)
- Law (1.00)
- Health & Medicine (1.00)
- Government (0.93)